Hive, Pig & Hbase Performance Evaluation for Data Processing Applications
نویسندگان
چکیده
Information extraction has received significant attention due to the rapid growth of unstructured data. Researcher needs a low-cost, scalable, easy-to-use and fault tolerance platform for large volume data processing eagerly. It is very important to evaluate the MapReduce based frameworks for data processing applications. This paper leverages the comparative study of HBase, Hive and Pig.The processing time of HBase, Hive and Pig is implemented on a data set with simple queries and we will observed the performance of the HBase, Hive, Pig and evaluate the result according to it.
منابع مشابه
Distributed RDF Triple Store Using HBase and Hive
The growth of web data has presented new challenges regarding the ability to effectively query RDF data. Traditional relational database systems efficiently scale and query distributed data. With the development of Hadoop its implementation of the MapReduce Framework along with HBase, a NoSQL data store, the semantics of processing and querying data has changed. Given the existing structure of ...
متن کاملData Mining over Large Datasets Using Hadoop in Cloud Environment
There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis ...
متن کاملData Quality for Web Log Data Using a Hadoop Environment
Solving data quality problems is important for data warehouse construction and operation. This paper is based on developing a web log warehouse. It proposes a data quality problem methodology for data preprocessing within the log warehouse. It provides a hierarchical data warehouse architecture that is suitable for resource saving and ad hoc requirements. The data preprocessing is completed usi...
متن کاملRank Join Queries in NoSQL Databases
Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQLlike languages (like Hive and Pig), based on...
متن کاملOptimizing data management for MapReduce applications on large-scale distributed infrastructures
ions were developed based on MapReduce, with the goal of providing a simple-touse interface for expressing database-like queries [64, 6]. Bioinformatics is one of the numerous research domains that employ MapReduce to model their algorithms [69, 58, 56]. As an example, CloudBurst [69] is a MapReduce-based algorithm for mapping next-generation sequence data to the human genome and other referenc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016